Optimal and Approximate Q-value Functions for Decentralized POMDPs
نویسندگان
چکیده
منابع مشابه
Optimal and Approximate Q-value Functions for Decentralized POMDPs
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q is computed in a recursive manner by dynamic programming, and then an optimal policy is extr...
متن کاملQ-value Heuristics for Approximate Solutions of Dec-POMDPs
The Dec-POMDP is a model for multi-agent planning under uncertainty that has received increasingly more attention over the recent years. In this work we propose a new heuristic QBG that can be used in various algorithms for Dec-POMDPs and describe differences and similarities with QMDP and QPOMDP. An experimental evaluation shows that, at the price of some computation, QBG gives a consistently ...
متن کاملOptimal Fixed-Size Controllers for Decentralized POMDPs
Solving decentralized partially observable Markov decision processes (DEC-POMDPs) is a difficult task. Exact solutions are intractable in all but the smallest problems and approximate solutions provide limited optimality guarantees. As a more principled alternative, we present a novel formulation of an optimal fixed-size solution of a DEC-POMDP as a nonlinear program. We discuss the benefits of...
متن کاملTowards Computing Optimal Policies for Decentralized POMDPs
The problem of deriving joint policies for a group of agents that maximze some joint reward function can be modelled as a decentralized partially observable Markov decision process (DEC-POMDP). Significant algorithms have been developed for single agent POMDPs however, with a few exceptions, effective algorithms for deriving policies for decentralized POMDPS have not been developed. As a first ...
متن کاملValue of Communication In Decentralized POMDPs
In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in overcommunication. This paper presents a general framework to compute the value of communicating from each agent’s local perspective, by comparing the expected reward ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Artificial Intelligence Research
سال: 2008
ISSN: 1076-9757
DOI: 10.1613/jair.2447